[WIP] Gensim docker container#1368
Conversation
| # Installs python, pip and setup tools (with fixed versions) | ||
| RUN apt-get update \ | ||
| && apt-get install -y \ | ||
| ant=1.9.6-1ubuntu1 \ |
There was a problem hiding this comment.
@tmylk @menshikh-iv Should I remove version pinning from here also?
There was a problem hiding this comment.
If fixation is not required, then, of course, you can remove it
There was a problem hiding this comment.
Pinning versions in a container is actually very good. It guarantees that it keeps running.
Just don't pin the gensim version.
There was a problem hiding this comment.
So please keep the pins here
There was a problem hiding this comment.
Guys, the fixation is needed due best practices with docker. Without these versions fixed the container can broke when a components changes it versions and when someone try to get it from a docker pull.
There was a problem hiding this comment.
Another thing: In my previous PR I forgot to add a modification in dockerfile that fix some dependencies for git dependencies to avoid this problems.
I was following the docker contribution guidelines to turn this image into an official one in the future.
There was a problem hiding this comment.
thanks @danielbdias, I've kept the pinned versions which were already there in your PR and will add version pinning for rest of the packages which I added later.
tmylk
left a comment
There was a problem hiding this comment.
The container is a very needed feature, just left some comments to make things more clear
| && mv ./gensim-$GENSIM_VERSION/* /gensim \ | ||
| && rm -rf /gensim/download \ | ||
| && cd /gensim \ | ||
| && python setup.py install |
There was a problem hiding this comment.
please checkout master branch first
There was a problem hiding this comment.
It will already download the package from branch which is specified in GENSIM_VERSION variable (and I use a branch on my personal fork for now which has the docker folder).
| ENV WR_HOME gensim_dependencies/wordrank | ||
| ENV MALLET_HOME gensim_dependencies/mallet | ||
| ENV DTM_PATH gensim_dependencies/dtm/bin/dtm-linux64 | ||
| ENV VOWPAL_WABBIT_PATH gensim_dependencies/vowpal_wabbit/vowpalwabbit/vw |
There was a problem hiding this comment.
Need to add varembed path in order to run varembed tests.
There was a problem hiding this comment.
According to test_varembed_wrapper.py varembed path is not required. It only tests on these test_data files.
| import sys | ||
|
|
||
| try: | ||
| from gensim.models.word2vec_inner import FAST_VERSION, MAX_WORDS_IN_BATCH |
There was a problem hiding this comment.
there is no need for max_words_in_batch
There was a problem hiding this comment.
Removed in latest commit
|
@parulsethi What's a status of PR? Have you any problems with container? |
|
Update: Proper installation of wordrank dependencies is left now (need to install OpenMPI with multithreading enabled). |
|
|
||
| ENV GENSIM_REPOSITORY https://github.com/parulsethi/gensim/archive | ||
| ENV GENSIM_BRANCH gensim_docker | ||
|
|
There was a problem hiding this comment.
Add gensim version as env variable
| && python3 setup.py install | ||
|
|
||
| # Set ENV variables for wrappers | ||
| ENV FT_HOME gensim_dependencies/fastText |
There was a problem hiding this comment.
Replace relative to absolute (for all *_PATH and *_HOME), it's very important.
|
|
||
| try: | ||
| from gensim.models.word2vec_inner import FAST_VERSION | ||
|
|
There was a problem hiding this comment.
Please add additional info for this script (like python version, numpy/scipy/gensim version) and create alias in docker for this script
There was a problem hiding this comment.
It does run from here in dockerfile with both python 2 and 3. And only the pinned versions of numpy/scipy/gensim installed in the container are used
| @@ -0,0 +1,7 @@ | |||
| version: '2' | |||
There was a problem hiding this comment.
Why you need docker-compose file?
| @@ -0,0 +1,19 @@ | |||
| #!/bin/bash | |||
|
|
|||
| printf "1. clean up workspace\n" | |||
There was a problem hiding this comment.
Why do you add this script here? This script from wordrank repo.
|
|
||
| # Install fastText | ||
| RUN cd /gensim/gensim_dependencies \ | ||
| && git clone https://github.com/facebookresearch/fastText.git \ |
There was a problem hiding this comment.
Please pin version for FastText/Wordrank/etc (you can use commit hash or version)
| && cd /gensim/gensim_dependencies/fastText \ | ||
| && make | ||
|
|
||
| # Install WordRank |
There was a problem hiding this comment.
Comment all things connected with wordrank in dockerfile (ompi problem)
| python2 setup.py test | ||
| ``` | ||
|
|
||
| To push the image to docker hub: |
There was a problem hiding this comment.
No needed this block (about push to dockerhub)
| docker push [my_user]/gensim | ||
| ``` | ||
|
|
||
| # Run gensim image from anywhere |
There was a problem hiding this comment.
Replace to Run ipython notebook with installed gensim
|
Nice feature, thank you @parulsethi 🥇 |
* added dockerfile * remove fasttext from pip installs * remove syntax errors * remove unused imports * modified dockerfile * add subversion, locales * use both python2 and python3 * upgrade numpy version * add readme with relevant commands * add fixed versions for wrapper dependencies * made requested changes * update readme * change vw pin and remove docker-yml * change vw version and make absolute paths for wrappers * specify original gensim repo for download * change maintainer * correct missing slash * use git clone for gensim * correct gensim folder sequences
TODO: